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Abstract 

It is a common contention that it is an "impossible mission" to exactly determine the 
minimum sample size for the estimation of a binomial parameter with prescribed margin of 
error and confidence level. In this paper, we investigate such a very old but also extremely 
important problem and demonstrate that the difficulty for obtaining the exact solution is not 
insurmountable. Unlike the classical approximate sample size method based on the central 
limit theorem, we develop a new approach for computing the minimum sample size that does 
not require any approximation. Moreover, our approach overcomes the conservatism of existing 
rigorous sample size methods derived from Bernoulli's theorem or Chcrnoff bounds. 

Our computational machinery consists of two essential ingredients. First, we prove that the 
minimum of coverage probability with respect to a binomial parameter bounded in an interval 
is attained at a discrete set of finite many values of the binomial parameter. This allows for 
reducing infinite many evaluations of coverage probability to finite many evaluations. Second, 
a recursive bounding technique is developed to further improve the efficiency of computation. 

1 Introduction 

The estimation of a binomial parameter is a fundamental problem in probability and statistics. 
The practical importance of such estimation problem can be seen by its numerous applications in 
various fields of sciences and engineering. Specifically, the problem is formulated as follows. 

Let X be a Bernoulli random variable defined in a probability space (0, Pr) such that 
Pr{X = 1} = p and Pr{X = 0} = 1 — p with p G (0, 1). It is a frequent problem to estimate p 
based on n identical and independent samples Xi, ■ ■ ■ ,X„ of X. The parameter p is referred to 
as a binomial parameter, since it defines a binomial experiment for a given sample size n. 

An estimate of p is conventionally taken as p„ = ''^^ — -. The nice property of such estimate is 
that it is of maximum likely-hood and possesses minimum variance among all unbiased estimates. 
A crucial question in the estimation is as follows: 
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Given the knowledge that p belongs to interval [a,b], what is the minimum sample size n that 
guarantees the difference between p„ and p be bounded within some prescribed margin of error 
with a confidence level higher than a prescribed value? 

It is generally believed that an exact answer to this fundamental question is not possible 
(see, e.g., [B] and the references therein). However, our recent investigation shows that an exact 
solution can be found by combining the power of mathematical analysis and modern computers. 
The main contribution of this paper is to provide an exact answer to this important question. In 
contrast to existing methods in the literature, we aim at finding rigorous solutions while avoiding 
unnecessary conservatism. 

The paper is organized as follows. In Section 2, the techniques for computing the minimum 
sample size is developed with the margin of error taken as a bound of absolute error. In Section 
3, we derive corresponding sample size method by using relative error bound as the margin of 
error. In Section 4, we develop techniques for computing minimum sample size with a mixed error 
criterion. Section 5 is the conclusion. The proofs are given in Appendices. 

Throughout this paper, we shall use the following notations. The set of integers is denoted 
by Z. The ceiling function and floor function are denoted respectively by [.] and [.J (i.e., \x\ 
represents the smallest integer no less than x; \x\ represents the largest integer no greater than 
x). For non-negative integer m, the combinatoric function C^) with respect to integer z means 



will be made clear as we proceed. 

2 Control of Absolute Error 

In this section, we shall review the classical sample size problem and the existing solutions. In 
particular, we shall elaborate the difficulty that has been considered as insurmountable in the 
literature. We will demonstrate that such "seemingly" insurmountable difficulty can be made 
disappear by a careful analysis of the coverage probability. 

Formally, the classical sample size problem is stated as follows. Let e G (0, 1) be the margin 
of absolute error and 5 G (0, 1) be the confidence parameter. In many applications, it is desirable 
to find the smallest sample size n such that 




for < z < m. 



for z < or z > m. 




Pr{|p„-p| <e} > l-<5 
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for any p £ [a,b]. Here Pr{|p„ — p\ < e} is referred to as the coverage probabiUty. The interval 
[a, b] is introduced to take into account the knowledge of p. If no knowledge of p is available, the 
interval [a, b] can be taken as [0, 1]. 

The classical sample size problem associated with ([1]) has been extensively studied in the 
literature. As pointed out in pages 83-84 of [U], it is commonly believed that the exact computation 
of the minimum sample size is impossible. This is due to the intuitive that, for suitably chosen 
ki and k2, 

Pr{|p„-p|<.}= X; (fjp'{l-pr-', (2) 
k=ki ^ ^ 

where both the summand and ki, k2 depend on the unknown value p, making the direct use of 
^ and therefore ([1]) "almost impossible in practice." Such argument is very typical and can be 
seen in page 84, lines 1-6 of [B]. In general, one tends to think that infinite many evaluations 
of the right-hand side of ([2]) is required to determine whether the coverage probability is greater 
than 1 — 5 for any p in interval [a,b]. Motivated by the "seemingly" prohibitive computational 
complexity, statisticians have been settled to finding approximation or conservative bounds for 
the minimum sample size associated with ([T]). 

The conventional solution is based on the normal approximation (see, e.g., O H] and the 
references therein) . The drawback of such sample size method is that the coverage probability in 
([1]) may be significantly below the prescribed confidence level 1 — 6. This can be an extremely 
severe problem in the case that the upper bound, b, of the binomial parameter is small. Such 
criticism is very usual as can be seen in IU|5] and many other literatures. The issue of inaccuracy 
remains significant even for the case that no information of p is available, i.e., [a,b] = [0,1]. In 
this case, the minimum sample size is approximated as 

nc « ^ (3) 

where satisfies ^ -^e~^ dx = |. Application of the approximate formula ^ must 
introduce unknown error in reporting the statistical accuracy of the estimation of p. In order 
to eliminate the inaccuracy of normal approximation, one can resort to the large deviation type 
inequalities to derive an upper bound for the minimum sample size. A well-known result is the 
ChernofF bound, which asserts that ([T|) is true for any p £ [0, 1] provided that 

n > ^. (4) 

The ChernofF bound significantly improves upon the sample size bound provided by the famous 
Bernoulli theorem, which states that ([T]) is ensured for any p G [0, 1] if 

- > (5) 

The major problem of sample size formulas ([4]) and ([5]) is the unduly conservativeness. The sample 
size obtained from ([5]) or ([S]) can be substantially larger than the minimum sample size. 
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Since one of the fundamental goals of statistics is to provide rigorous and the least conservative 
quantification of uncertainty in statistical inference, it is a persistent concern of statisticians and 
practitioners to determine the exact value of minimum sample size associated with ([T|). After a 
thorough investigation, we discovered that the exact determination of minimum sample size is 
readily tractable with modern computational power by taking advantage of the behavior of the 
coverage probability characterized by Theorem [1] as follows. 

Theorem 1 Let < e < 1 and < a < 6 < 1. Let Xi, ■ ■ ■ ,Xn be identical and independent 
Bernoulli random variables such that, for i = 1, ■ ■ ■ ,n, Pr{Xi = 1} = 1 — Pr{Xj = 0} = p with 
p € [a,b]. Letp^ = — -■ Then, the minimum o/Pr{|p„ — p\ < e} with respect to p G [a, 6] is 
achieved at the finite set {a, 6} U + e S (o, 6) : £ G Z} U — e € (a, 6) : £ G Z}, which has less 
than 2n{b — a) + 4 elements. 

See Appendix A for a proof. The application of Theorem [T] in the computation of minimum 
sample size is obvious. For a fixed sample size n, since the minimum of coverage probability with 
p £ [a,b] is attained at a finite set, it can determined by a computer whether the sample size n 
is large enough to ensure ([1]) for any p € [a,b]. Starting from n = 2, one can find the minimum 
sample size by gradually incrementing n and checking whether n is large enough. 

By the fact of symmetry that Pr{|(l — p„) — (1 — p)| < e} = Pr{|p„ — p\ < e}, we can restrict 
p to a smaller range [a' , b'] such that 



where 



niin Pr{|p„ - p\ < e} = mm Pr{|p„ - p\ < e} 

p&[a,b] pe[a',6'] 



for 6 < i, 



, , a for a + 6 < 1 , , , 

a = < b' = {I for a < i < 6, 

1-5 fora + 6>l 
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1 — o for a > I . 



Clearly, < a' < 6' < ^ and b' — a' < b — a. Hence, without loss of any generality, we can assume 
0<a<p<6<i. 

As can be seen from Theorem [H for a = and b = ^, the total number of binomial summations 
to be evaluated is no more than n + 2, since the coverage probability for a = is one and no 
computation is needed. 

For computational purpose, we have 

Theorem 2 Let < a < b < ^ and < e < ^ . Define 

y - {c+{e)\l+lnia~e)\ <i<\nib-e)^-l}U{ca, Cb] 
U {€-{£) I 1 + ln{a + e)\ <£< \n{b + e)] - 1} 

where 

Ca = E[=ln{a-~e)] +iB{n,k,a), Cb = K^^)! +1 B{n,k,b), 
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c+{i) = EtSi'"^ B {n, k,l + e), c.{£) = EtUi-r2nel B {n, k,l-e)forieZ 
Define 

A{n, e, r, s) = B{n - l,r - 1,9) + B (^n, s + 1,9 + - B{n, r-l,e)- B{n - 1, s, 9) 

n — 1 

+ — — \B(n -2,r~2,9) + B(n - 2, s, 9) - Bin - 2, r - 1, 61) - Bin - 2, s - 1, 9)] , 
2n 

A{n, 9, r, s) = B{n -l,r-l,9) + B (^i, s + l,9+^^ - B{n, r-1,9)- B{n - 1, s, 9) 

n — 1 

+ — — \Bln - 2,r - 2,9) + B{n ~ 2,s,9) - B{n-2,r~ 1,9) ~ Bin - 2,s-l,9)] 
2n 

with B_{n, k, 6) = min {i?(n, k,6), B (n, A:, 6* + ^) } and 



B{n,k,9) = < 



max{Bin,k,0), B{n,k,0 + ^)} for ^ ^ [0,0 + ^], 
B{n,k,^) for [9,6 + ^]. 



Then, the following statements hold true: 

(I) The minimum o/Pr{|p„ —p\ < e} with respect to p € [a, b] equals the minimum of .5^ , i.e., 

minpe[a,6] Pr{|Pn -p| < e} = min^. 

(U) For [n(a - e)J <£< \n{b - e)] - I, 

[1 -€+{£)] + A{n,9e,n,se) < l-c+(^-l) < [1 - c+(^)] + Z(n, 0^, r^, s^), (6) 

where 9^ = ^ + e, = £ + 1 and st = (. — 2 + \2ne\ . 
(Ill) For [n{a + e)\ <£< \n{b + e)] - 1, 

[1 - c_ {£)] + A{n, 9',, r'„ s',)<l- c.{£ - 1) < [1 - c_ {£)] + A{n, 9',, r'„ s',) (7) 

where 9'^ = ^ -e, r[ = £ + l- \2ne\ and s\ = £-2. 

See Appendix B for a proof. 

For the purpose of reducing roundoff error, we shall evaluate the complementary probability 
1 — c+ (£) , 1 — c_ (^) , 1 — Ca , 1 — Cfo and compare them with b to determine whether the sample size 
n is large enough. Since the comparison between 1 — c+(£) and b usually only requires bounds of 
1 — c+(£), a large amount of computation can be saved if we start from £ = \n{b — e)] — 1 and 
recursively build the bounds of 1 — c+(£ — 1) from the bounds of 1 — c+(£) by making use of ([6|). 
Similarly, we can apply ([7]) to reduce the computation for the comparison between 1 — c_ {£) and 
5. This computational technique is especially useful when the sample size is large. Due to the 
recursive nature, we call it as the recursive bounding technique. It should be noted the bounds 
may become too conservative to be useful as the number of recursive steps increases. In that 
situation, the recursive process should be restarted with exact computation for the current index 
L 
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Table 1: Table of Sample Sizes 



e 


6 


n 


e 


6 


n 


e 


6 


n 


0.1 


0.05 


101 


0.1 


0.01 


171 


0.1 


0.001 


276 


0.05 


0.05 


391 


0.05 


0.01 


671 


0.05 


0.001 


1091 


0.01 


0.05 


9651 


0.01 


0.01 


16601 


0.01 


0.001 


27101 


0.005 


0.05 


38501 


0.005 


0.01 


66401 


0.005 


0.001 


108301 


0.001 


0.05 


960501 


0.001 


0.01 


1659001 


0.001 


0.001 


2707001 



In order to reduce computational effort, the evaluation should be performed earlier for - ± e 
closer to ^ for the purpose of earlier determination of whether the sample size is sufficiently large. 
This computational trick is motivated by our computational experience that for many values of 
i, 1 — C-)_(£) is non-decreasing with respect to i. The situation is similar for 1 — c_(£). 

To demonstrate the feasibility of our computational method, we provide some sample size 
values in Table[T]for the case that [a, b] = [0, 1], i.e., no information for p is available. Actually, with 
a few hours of computer running time, we have produced a MATLAB data file for a large number 
of combinations of margin of absolute error and confidence level. Although the computational 
complexity of our approach is much higher than that of existing explicit formulas, the computer 
running time is not an issue since a large data file of sample size can be created and saved for 
forever use. 

3 Control of Relative Error 

Let e € (0, 1) be the margin of relative error and 6 £ (0, 1) be the confidence parameter. It is 
interesting to determine the smallest sample size n so that 

Pr| '^»-^' <e}>l-J 

for any p € [a, b]. As has been pointed out in Section 2, an essential machinery is to reduce infinite 
many evaluations of the coverage probability Pr{|p„ — p\ < £p} to finite many evaluations. Such 
reduction can be accomplished by making use of Theorem [3] as follows. 

Theorem 3 Let < e < 1 and < a < 6 < 1. Let Xi, ■ ■ ■ ,X„ be identical and independent 
Bernoulli random variables such that, for i = 1, ■ ■ ■ ,n, Pr{Xi = 1} = 1 — Pr{Xj = 0} = p with 
p € [a,b]. Let p„ = ^^^f— ^- Then, the minimum of Pr | ^^'"-'^^ < e| with respect to p £ [a,b] is 
achieved at the finite set {a,b} U { ^{i+e) ^ i'^^b) : £ £ Z} U {j^y^ £ {a,b) : i £ Z}, which has 
less than 2n{b — a) + 4 elements. 

See Appendix C for a proof. For computational convenience, we have 
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Theorem 4 Let < a < b < 1 and < e < 1. Define 

^,ev = {c+{e)\l+[72a{l^e)\<e<\nb{l-e)^-l}U{ca, Cb} 
U {c_(£) I 1 + [na(l + £)J < ^ < \nb{l + e)] - 1} 

where 



C„ 



Then, the minimum o/Pr{|p„ — p\ < £p} with respect to p & [a, 6] equals the minimum of S^^ew, 
i.e., miiipeja^fe] Pr{|p„ - p\ < ep} = min=5^rev 

Actually, a similar type of recursive bounding technique as Theorem [2] can be developed to 
improve the computational efficiency. Theorem [J] can be proved by applying Theorem [3] and the 
following observations: 

(i) Pr{|p„ — p\ < £p} assumes values Ca and ci, for p = a and h respectively. 

in) For p = G e (a, 6) : ^ e Z}, we have \np{l + e)] - 1 = ^ - 1, [np(l - e)J + 1 = 

+ 1, c-{t) = Pr{|p„ -p| < ep} and 1 + [na(l +e)J <l< \nb{l + e)] - 1. 
(in) Forp = G G (a,6) : £ G Z}, we have [np(l-e)J + l = ^+1 and = 

1, c+(^) = Pr{|p„ -p| < ep} and 1 + [na(l - e)J < ^ < [?i6(l - e)] - 1. 



l+ep 



4 Control of Absolute Error or Relative Error 

Let Ea € (0, 1) and G (0, 1) be respectively the margins of absolute error and relative error. 
Let (5 G (0, 1) be the confidence parameter. In many situations, it is desirable to find the smallest 
sample size n such that 

Pr||p„-p|<£, or <£,| > 1-J (8) 

for any p G [a, 6]. To make it possible to compute exactly the minimum sample size associated 
with ([8]), we have Theorem [5] as follows. 

Theorem 5 Let 0<ea<l, 0<er<l and 0<a<|^<6<l. Let Xi, - ■ ■ , Xn be identical and 
independent Bernoulli random variables such that, for i = 1, - ■ ■ ,n, Pr{Xj = 1} = 1 — Pr{Xj = 
0} = p withp G [a, b\. Letp^ = — -. Then, the minimum ofPr < |p„ — p\ < £a or '^"^ ^' < r 
with respect to p G [a, b] is achieved at the finite set {a, b, —}U{-+£a G (a, — ) : £ G Z}U{- — G 
{^,b) : £ G Z} U {;^(^ e : £ G Z} U {^^^ G (f^,6) : £ G zf, «;/izc/i has less than 

2n{b — a) + 7 elements. 



As can be seen from Theorem \5\ for a = and 6=1, the total number of evaluations of 
probability is no more than 2n + 4. The detailed proof of Theorem [5] is omitted since it can 
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Table 2: Table of Sample Sizes (e,. = 0.1) 





5 


n 


ea 


5 


n 




5 


n 


0.05 


0.05 


391 


0.05 


0.01 


671 


0.05 


0.001 


1091 


0.01 


0.05 


3501 


0.01 


0.01 


6051 


0.01 


0.001 


9801 


0.005 


0.05 


7401 


0.005 


0.01 


12701 


0.005 


0.001 


20701 


0.001 


0.05 


38501 


0.001 


0.01 


66501 


0.001 


0.001 


108001 



deduced from Theorem [T] and Theorem [3] with the observation that 



Pr <^ |p„ -p\<£a or 



\Pn - P\ 
P 



Pr{|p„ -p\< £a] for p G 



Pj. f |Pn-Pl 



for p G ( — , 6 



Such observation also indicates that the sample size problem associated with ([8]) can be decom- 
posed as the sample size problems for the cases of absolute error and relative error discussed 
previously. 

To show the effectiveness of our sample size method, we present some sample size numbers in 
Table[2]for the case that no information for the binomial parameter is available, i.e., [a, b] = [0, 1]. 
We would like to remark that the computation can be easily managed by any personal computer. 

Finally, we would like to point out that similar characteristics of the coverage probability 
can be shown for the problem of estimating a Poisson parameter or the proportion of a finite 
population, which allows for the exact computation of minimum sample size. For details, see our 
recent papers [HIS]. 



5 Conclusion 

Determining sample size is a very important issue because samples that are too large may waste 
time, resources and money, while samples that are too small may lead to inaccurate results. We 
have developed an exact method for the computation of minimum sample size for the estimation 
of binomial parameters, which is not computational demanding. Our sample size method permits 
rigorous control of statistical sampling error. Exact previously unavailable minimum sample sizes 
is obtained by implementing the new method on a personal computer. Specially, for the convenient 
use of practitioners, we have obtained a MATLAB data file of sample sizes for a very large number 
of combinations of margin of error and confidence level, which can be available upon request. It 
is hoped that our sample size method can be useful to improve the rigorousness and efficiency of 
statistical inference on the very old estimation problem of binomial parameters. 
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A Proof of Theorem [T] 



We denote the number of successes as K = Y17=i Define 



C{p) = Pr 



K 

P 

n 



<e} = Pr{g{p) <K< h{p)} 



where 



g(p) = [n{p - e)J + 1, h{p) = \n{p + e)] - 1. 



It should be noted that C{p), g{p) and h{p) are actually multivariate functions oip, e and n. For 
simplicity of notations, we drop the arguments n and e throughout the proof of Theorem [TJ 
We need some preliminary results. 



Lemma 1 Let 



^ - e where £ G Z. Then, h{p) = h{pe+i) = i for any p e {pe,pe+i). 



Proof. For p G {pi, pi+i), we have < n{p — pi) < 1 and 

hip) = \n{p + £)]-! 

= \n{pe + e + p- pe)] - 1 

e + e + p-pi 



n 



n 



l-l+\n{p- p^)~\ 



^+1 

n I e + e 

n 



1 = h{pi+i). 



□ 



Lemma 2 Let pi = ^ + e where £ G Z. Then, g{p) = g{pe) = £ + 1 for any p G {pe,Pe+i) 



Proof. For p G (pi, Pi+i), we have —l<n{p — pi+i) < and 

g{p) = ln{p -e)\+l 

= [n(p£+i -e+p-p^+i)J + 1 



n 



n 



'±} 
n 

'±1 
n 



+ £ — e 



+ £ — e 



n\ — \- £ — £ 
n 



+ [n{p-pe+i)\ + 1 
-1 + 1 

+ 1 = g{pi)- 



□ 
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Lemma 3 Let a < (3 be two consecutive elements of the ascending arrangement of all distinct 
elements of {a, 6} U + e G {a,b) : I £ Z} U - e e (a, b) : I Then, both g{p) and h{p) 

are constants for any p £ (a,/3). 

Proof. Since a and (3 are two consecutive elements of the ascending arrangement of all distinct 
elements of the set, it must be true that there is no integer £ such that a < ^ + e</3or 
a < ^ — e < p. It follows that there exist two integers £ and £' such that (a, /3) c + e, + e) 
and (a, /3) C (^77 — — £^ • Applying Lemma [1] and Lemma [21 we have g{p) = 5 (|; + e) and 

h{p) = h - for any p G (a, 13). 

□ 



Lemma 4 For any p G (0, 1), lim^jo C{p + r]) > C{p) and lim^jo C{p — r]) > C{p). 
Proof. Observing that h{p + rj) > h{p) for any r/ > and that 

dip + ^) = ln{p + ri - e)\ + 1 

= [n{p — e)J + 1 + [n{p — e) — — e)J + nri\ 
= [n{p -e)\+l = g{p) 

for < ?7 < ^+lnip-e)}-n{p-e) ^ ^^^^ 

S{n,g{p + r]),h{p + r]),p + r]) > S{n, g{p), h{p),p + r]) (9) 

r n ^ , l+\n(p—£)\—n(p—e) o- 

for < < — ^^-^ — ^ — ^ — -. bmce 

Hp + v) = ["-(P + ^ + e)l - 1 = [iT-iP + - 1 + ["-(p + e) - ["-(P + e)l + nr]] , 

we have 



+ e)] for n{p + e) = \n{p + e)] and < < i, 

[n(p + e)l - 1 for n{p + e) \n{p + e)] and < 7? < rn(p+e)1-n(p+^) _ 



It follows that both g(p + rj) and + 77) are independent of if > is small enough. Since 
S{n, g,h,p + rj) is continuous with respect to rj for fixed g and /i, we have that lim^jo S{n, g{p + 
ry), + + ry) exists. As a result, 

limC(p + r7) = \\m.S{n,g(p + ri),h(p + if),p + T]) 

> \\mS{n,g{p),h{p),p + r]) = S{n, g{p), h{p),p) = C{p), 

where the inequality follows from Q. 
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Observing that g{p — 1]) < g{p) for any 7] > and that 

h{p — 7]) = \n{p — r/ + e)] — 1 

= \n{p + e)] - 1 + \n{p + e) - \n{p + e)] - nrj] 
= \n{p + £)]-! = h{p) 

for < 7? < l+n{p+s)~\n{p+e)] ^ ^^^^ 

S{n, g{p - rj), h{p - ri),p - r]) > S{n,g{p),h{p),p - rj) (10) 

r n ^ , l+n(p+e)—\n{p+e)'] o- 

tor < < — — ' — —. Since 

gip ~ v) = [n{p — T] — e)\+l = [n{p — e)J + 1 + [n{p — e) — [n{p — e)\ — nr]\ , 

we have 



gip- v) 



[n{p — e)J for n{p — e) = [n{p — e)J and < rj < ^, 

ln{p - e)J + 1 for n{p - e) / [n{p - e)J and < r/ < "(p-^)H"(p-g)J _ 



It follows that both g{p — ij) and h{p — rf) are independent of if r/ > is smah enough. Since 
g,h,p — r/) is continuous with respect to r] for fixed g and h, we have that lim^jo S{n, g{p — 
'y)j h{p — il)iP ~ "n) exists. Hence, 

hmC(p — 77) = lim S{n, g{p — T]), h{p — i]),p — T]) 

> lim S{n,g{p),h{p),p - rf) = S{n, g{p),h{p),p) = C{p), 

where the inequality follows from (jlOp . 

□ 



Lemma 5 Let < u < v < 1 and g < h. Then, 



min S{n, g, h,p) = mm{S{n, g, h,u), S{n,g,h,v)}. 

pe[u,v] 



Proof. It can be checked that 

dB{n, k,p) 



dp 

for any integer k. By (llip . it is ready to show that 

dS{n,Q,l,p) 



n[B{n-l,k-l,p) - B{n-l,k,p)] (11) 



nB{n-l,l,p). (12) 



dp 

To show the lemma, it suffices to consider 6 cases as follows. 

Case (i): g < h < < n. In this case, S{n, g, h,p) = for any p G [u,v]. 
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Case (ii): < g < n < h. In this case, S{n, g, h,p) = S{n, g,n,p) = 1 — S{n, 0, g — l,p), which 
is increasing in view of (I12p . 

Case (iii): < n < g < h. In this case, S{n,g, h,p) =0 for any p S [u,v]. 

Case (iv): g < < h < n. In this case, S{n, g, h,p) = S{n,0, h,p), which is decreasing as a 
result of (fn]) . 

Case (v): g < < n < h. In this case, S{n, g, h,p) = 1 for any p G [u,v]. 
Clearly, the lemma is true for the above five cases. 

Case (vi): < g < h < n. By (fT2]) . 

dC{p) dS{n,0,h,p) dS{n,0,g~l,p) 



dp 



dp 



hl{n-h-iy. 



dp 

j3-\i-py'-^ - 



h\{n-h-iy 



p'\i-p) 



n — h — 1 



p 



{g-iy.in-gy yi - p 

■ 1 



h-g+l 



n' 

P^^^l - p)""^ > 



h\{n~h~iy. 
(g-l)!(n-s)! 



if p < 1 - |l 

From such investigation of the derivative of C{p) with respective to p, we can see that one 
of the following three cases must be true: (1) C(/x) decreases monotonically for i^i € [u,v]; (2) 
C(/i) increases monotonically for /i € [u,v]; (3) there exists a number 6 € {u,v) such that C{fi) 
increases monotonically for fi € [u,9] and decreases monotonically for /x G {0,v]. It follows that 
the lemma must be true for Case (vi). 

□ 



Lemma 6 Let a < f3 be two consecutive elements of the ascending arrangement of all distinct ele- 
ments of{a,h]U{{+e G (a,6) : £ € Z}u{^-e G (a,6) : £ G Z}. Then, C{p) > min{C7(Q), C{P)} 
for any p G (a, 13). 

Proof. By Lemma [3l g{p) and h{p) are constants for any p G {a, (3). Hence, we can drop the 
argument and write g{p) = g, h{p) = h and C{p) = S{n,g, h,p). 

For p G (a, define interval [a + /3 — rj] with < ?/ < min - a, /? - p, ^^^^ . Then, 
p G [a + /3 — ?/]. By Lemma [5l 

C(p) > min C{^JL) = min{C(a + ry), C{(3 - ??)} 

AiG[a+r/,/3--;j] 

for < 77 < min {p - a, (3 - p, "^^^^ • By Lemma [H both lim^^o C{a + rf) and lim^jo C{f3 — rj) exist 
and are bounded from below by C{a) and C{f3) respectively. Hence, 

C{p) > lim min{C(Q + 77), C{f3 - 77)} 

r?J,0 

= min JlimC(a + 7?), lim C(/3-r?) I > min{C(a), C(/3)} 
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for any p € (a, (3). 



□ 



We are now in position to prove Theorem [TJ The statement about the minimum of the 
coverage probabihty follows immediately from Lemma [6l The number of elements of the discrete 
set described in Theorem [1] can be calculated as follows. Since 

£ 

a< --e <b 1 + [n(a + e)J <i< \n{b + e)] - 1, 
the number of elements in — e G (a, 6) : ^ S Z} is 

\n{b + e)] - [n{a + e)\ - 1 < n{b + e) + 1 - [n{a + e) - 1] - 1 = n(6 - a) + 1. 

Since 

i 

a<- + e <b 1 + [n{a -e)\ <£< \n{b - e)] - 1, 
the number of elements in + e G (a, 6) : ^ E Z} is 

\n{b - e)] - [n(a - e)\ - 1 < n{b - e) + 1 - [n{a - e) - 1] - 1 = n{b 

Hence, the total number of elements of {a, 6} U + e E {cL,b) : ^ E Z} U - 
is less than 2n{b — a) + 4. This concludes the proof of Theorem [H 



-a) + l. 

- e e {a,b) : i e Z} 



B Proof of Theorem [2] 

Clearly, Pr{|p„ — p\ < e} = Ca for p = a and Pr{|p„ — p| < e} = for p = b. 
For p = ^ - e E - e E (a, 6) : ^ E Z}, we have 

\n{p + e)] - 1 = i - 1, [n{p - e)J + 1 = - 2ne\ + l=i+l- \2ne] , 

c_ {£) = Pr{|p„ -p\ <e} and 1 + [n(a + e)\ <i< \n{b + e)] - 1. 
For p = ^ + e E + e E (a, 6) : i e Z}, we have 

ln{p - e)\ +1 = i + \n{p + e)] - 1 = + 2ne] -1 = 1-1 + \2ne] , 

c+{e) = Pr{|p„ -p\<e} and 1 + [n{a -e)\ <i< \n{b - e)] - 1. 

Hence, statement (I) of Theorem [2] can be shown by making use of the above observation and 
invoking Theorem [TJ 



To show statements (H) and (HI), consider function S{n, r, s,p) = X^^^^ -^('^j k,p) with r < s. 

dp 

d^Sin, r, s,p) 



Applying (fT2]) . we can show that ^'^^'^^ "'^^ = n[B{n - l,r - l,p) - B{n — l,s,p)] and 



n(n - l)[B{n - 2, r - 2,p) + B{n - 2, s,p) - B{n - 2, r - l,p) - B{n - 2, s - 
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Define A{n, 6, r,s) = S (n, r, s + 1,9 + ^) — S{n, r — l,s, 9). Then, 

A{n, 9, r,s) = B (^n,s + l,0+^^ - B{n, r-l,e) + S r,s,e+^^ - S{n, r, s, 9). 
By Taylor's expansion formula, 

1 



of a , ^\ Of a\ 9S{n,r,s,p) 

b [n,r,s,0 -\ I — S(n,r,s,9) = 



1 ^ d S{n,r,s,p) 

X 



n dp- 



,2 



. ^ ^< . , ^ , ^ ^ 
n / op 

= B{n - 1, r - 1, 6*) - B{n - 1, s, 9) 
n - 1 

+ ^ [B{n - 2, r - 2, C) + B{n - 2, s, C) -B{n-2,r- 1, C) - ~ 2, s - 1, C)] 

where C G (6*, 6* + It follows that 

A{n, 9, r, s) = B{n - l,r - 1,9) + B (^n,s + 1,9 + - B{n, r-1,9)- B{n - 1, s, 9) 
n - 1 

+ — — [B{n - 2, r - 2, C) + B{n - 2, s, Q - B{n - 2, r - 1, C) -B{n-2,s- 1, C)] . 
2n 

Differentiation of B{n, k,p) with respect to p shows that B{n, k,p) increases for p € [0, ^] and 
decreases for p G i^' -'^]' a result, 

min B{n,k,p) = B{n,k,9), max p''{l - p)""'' = B{n,k,9), 

leading to A{n, 9, r, s) < A{n, 9, r, s) < A{n, 9, r, s). 

To bound 1 — c+(£ — 1) based on the bounds of 1 — c+(£), we can use the relationship 

1 - c+(£ - 1) = [1 - c+{£)] + [c+(£) - c+{£ - 1)] = [1 - c+(£)] + A{n, 9e, n, st). 
Similarly, we can bound 1 — c_(£ — 1) in terms of the bounds of 1 — c^{t) by observing that 

1 _ - 1) = [1 - c_(£)] + [c„(£) - c„(£ - 1)] = [1 - c_(£)] + A{n,9[,r[,s',). 
This concludes the proof of Theorem [2j 

C Proof of Theorem [3] 

Define 

C{p) = Pr ■ 

where 





K 




{ 


P 






n 





<ep} =Ft {g{p) <K < h{p)} 



dip) = [np{l - e)J + 1, hip) = \np{l + e)] - 1. 

It should be noted that C{p), g{p) and h{p) are actually multivariate functions of p, e and n. 
For simplicity of notations, we drop the arguments n and e throughout the proof of Theorem [3l 
We need some preliminary results. 

Lemma 7 Let pi = ^^^^f^^^ where £ € Z. Then, h(p) = h{p£^i) = £ for any p S {pe,Pe+i)- 
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Proof. For p G (p^, pn+i), we have < n(l + s) {p — pi) < 1 and 



hip) 



\np{l+e)] - 1 

\npi{l + e) + n(l + - p^)] - 1 



n 



n 



+ {l + e)ip-pi) 



1 



£-l+\n{l + e) (p-pi)] 



n 



£+1 
n{l + e] 



X (1 + e) 



1 = h{pi+i). 



□ 



Lemma 8 Let pi 



n(l-e) 



where £ € Z. T/ien, g{p) = ^(p^) = £ + 1 /or any p € 



Proof. For p G (p^, p^+i), we have —1 < n(l — e){p — p^+i) < and 

g{p) -- 



[np{l -e)\+l 
[n[pe+i{l -e) + {l-e){p- pe+i)]\ + 1 

+ ln{l-e){p-pi+i)\ + l 



+ 1 , 
?^ X —Tz T X (1 — e) 



n X 



i + l 



n X 



n(l -e) 
n(l -e) 

n(l -e) 



(1-e) 



x(l-e) 



1 + 1 



+ 1 = 



□ 



Lemma 9 Let a < (3 be two consecutive elements of the ascending arrangement of all distinct 
elements of {a, b} U {j^^^ e {a,b) : i £ Z} U { n(i_^s) € (a, 6) : £ G Z}. Then, both g{p) and h{p) 
are constants for any p € (a, (3). 

Proof. Since a and (3 are two consecutive elements of the ascending arrangement of all distinct 



elements of the set, it must be true that there is no integer £ such that a < 
< (3. It follows that there exist two integers £ and £' such that {a, (3) C 



^(l-e) 



< (3 OT 



a < 



n{l+e) 



I t+l 
n(l-e) ' n(l-e) 



and (a,/?) c (^ ^^/^^^ , n{i+e) ) • ^PP^y^S Lemma [7] and Lemma [HI we have g{p) = 
Hp) = h (4t4^) foi' any p G (a, /3). 
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and 



□ 



Lemma 10 For any p G (0, 1), lim^jo C{p + rj) > C{p) and lim^jo C{p — rf) > C{p). 
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Proof. Observing that h{p + i]) > h{p) for any 7] > and that 

g{p + ri) = [n(p + r/)(l -e)J +1 

= \np{\ — e)J + 1 + [np(l — e) — \ np{l — e)J + nr]{l — e)\ 
= [np{l - e)J + 1 = g{p) 

for < r? < l+lnp{l~e)^~Ml-e) ^ ^^^g 

S{n, g{p + r]),h{p + r]),p + r]) > S{n,g{p),h{p),p + rj) (13) 

for < ?7 < l+lnp{l-e)i^-Ml-e) ^ g.^^^ 

h{p + r]) = \n{p + r]){l + e)] - I 

= \np{l + e)] - 1 + \np{l + e) - \np{l + e)] + nrj{l + e)] , 



we have 

h{p + T]) 



\np{l + e)] for np{l + e) = \np{l + e)] and < < :^y^, 

\np{l + £)] - 1 for np{l + e) ^ \np{l + e)^ and < 77 < ^"^^'+;g;;f ■ 



It foHows that both g{p + rj) and h{p + rj) are independent of r/ if > is small enough. Since 
S{n, g,h,p + T]) is continuous with respect to ry for fixed g and /i, we have that lim^jo S{n, g{p + 
ry), h(j) + i]),p + 7]) exists. As a result, 

lim C(p + r/) = limS'(n, + 77), + + ry) 

> limS{n,g{p),h{p),p + ri) = S{n, g{p),h{p),p) = C{p), 



where the inequality follows from (jl3p . 

Observing that g{p — rj) < g{p) for any 7] > and that 

h{p — T]) = \n{p — rj){l + ey] — 1 

= \np{\ + e)] — 1 + \np{l + e) — \np{l + e)] — n?y(l + e)] 
= \np{l + e)] - 1 = 

for < r? < i+np(i+^)-rnp(i+^)1 ^ ^^ve 

S{n,g{p - r]),h{p - ri),p - J]) > S{n,g{p),h{p),p - 77) (14) 

for < 7? < rnp(l+e)1 ^ gj^^^ 

g{p-r]) = [7i{p-r]){l-e)\+l 

= [np{l — e)J + 1 + [np{l — e) — [np{l — e)J — ?2?/(l — e)J , 



16 




[np{l - e)] for np{l - e) = [np{l - e)\ and < 77 < 

lnp{l - £)J + 1 for np{l - e) ^ \_np{l - e)\ and < ?; < "^<''"j~L"^^'~''^ ■ 

It follows that both g{p — rf) and h{p — rf) are independent of 77 if r/ > is small enough. Since 
S{n, g, h,p — r]) is continuous with respect to r] for fixed g and h, we have that lim^|o S{n, g{p — 
rj), h{p — 'i]),p — 7]) exists. Hence, 

limC(p — 77) = lim S{n, g{p — 7]), h{p — i]),p — rj) 

> \im S{n,g{p),h{p),p - r?) = Sin, g(p), h{p),p) = C{p), 



where the inequality follows from (jl4p . 

□ 



By a similar argument as that of Lemma [U we have 

Lemma 11 Let a < [3 he two consecutive elements of the ascending arrangement of all dis- 
tinct elements of {a,b} U { ^^/^^-^ e {a,b) : £ e Z} U {^^(j^ e {a,b) : i e Z}. Then, C{p) > 
min{C(a), C(/3)} for any p € 



Now we are ready to prove Theorem [3l Clearly, the statement about the minimum of the 
coverage probability follows immediately from Lemma [TTl It remains to calculate the number of 
elements of the discrete set described in Theorem [3l Since 

a < — - < 6 ^ 1 + [na{l + e)\<i< \nb{l + e)] - 1, 
n(l + e) 

the number of elements in { ^{i+e) ^ i'^^b) : i E Z} is 

\nb{l + e)] - [na(l +e)\-l< nb{l + e) + 1 - [na(l + e) - 1] - 1 = n{b - a){l + e) + 1. 
Since 

£ 

a < — < 6 ^ 1 + [na{l -e)\<i< \nb{l - e)] - 1, 

n(l — e) 

the number of elements in { n{i~e) ^ i'^^b) : £ E Z} is 

\nb{l - e)] - Lna(l - e)J - 1< nb{l - e) + 1 - [na{l - e) - 1] - 1 = n{b - a){l - e) + 1. 

Hence, the total number of elements of {a, 6} U {j^fz^ G (o, 6) : £ G Z} U {j^^^^ G {a,b) : i & Z} 
is less than 2n{b — a) + 4. The proof of Theorem [3] is thus completed. 
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